Journal of Proteome Research
● American Chemical Society (ACS)
All preprints, ranked by how well they match Journal of Proteome Research's content profile, based on 215 papers previously published here. The average preprint has a 0.14% match score for this journal, so anything above that is already an above-average fit. Older preprints may already have been published elsewhere.
Roarty, C.; Mills, C.; Tonry, C.; Cosgrove, P.; Norman-Bruce, H.; Groves, H.; Watson, C.; Waterfield, T.
Show abstract
SARS-CoV-2 infection in children results in a wide range of clinical outcomes. Paediatric Multisystem Inflammatory syndrome temporally associated with COVID-19(PIMS-TS) occurs weeks after a SARS-CoV-2 infection, and results in severe illness. This protocol describes a study to fully characterize the circulating proteome of children who have PIMS-TS, the proteome of healthy children who have previously been infected with SARS-CoV-2 and the proteome of febrile children with a confirmed invasive infection. Orthogonal proteomic techniques will be utilized to provide a deep proteomic characterization.
Ramtirtha, Y.; Madhusudhan, M. S.
Show abstract
SUMOylation is a post translational modification that involves covalent attachment of SUMO C-terminus to side chain amino group of lysine residues in target proteins. Disruption of the modification has been linked to neurodegenerative diseases and cancer. Recent improvements in mass spectrometry-coupled proteomics experiments have enabled high-throughput identification of SUMOylated lysines in mammalian cells. One such study was Hendriks et al, 2018, wherein the authors identified SUMOylated lysines in human and mouse cells. Information from this study was used as an input to a sequence homology based method to annotate putative SUMOylatable lysines from the proteome of fruit fly Drosophila melanogaster. 5283 human and 468 mouse SUMOylated proteins led to the identification of 8539 and 1700 fly homologs and putative SUMOylation sites therein respectively. Clustering analysis was carried out on these annotated sites to obtain three typs of information. First type of information revealed amino acid preferences in the local sequence vicinity of the annotated sites. This exercise confirmed that {psi} - K - x - (E/D) where {psi} = I/V/L, is the most frequently occurring sequence motif involving SUMOylated lysines. Second type of information revealed protein families that contain the annotated sites. Results from this exercise reveal that members of thousands of protein families contain annotated SUMOylation sites. Third type of information revealed preferred biological and cellular functions of proteins containing the annotated lysines. This exercise revealed that nucleus and transcription are preferred cellular localization and biological function respectively.
Plank, M. J.
Show abstract
Mass spectrometry based phospho-proteomics is a widely used approach to assess protein phosphorylation. Intensities of phospho-peptide ions are obtained by integrating the MS signal over their chromatographic peaks. How individual peptide measurements mapping to the same phospho-site are combined for the quantification of the given site is, however, in most cases hidden from researchers conducting, reviewing, and reading these studies. I here describe pSiteExplorer, an R script that visualizes the peak intensities associated with phospho-sites in MaxQuant output tables. Barplots of MS intensities originating from phospho-peptides with distinct amino acid sequences due to missed cleavages, different numbers of phosphates and from all off-line chromatographic fractions and charge states are displayed. This tool will help gaining a deeper insight into phospho-site quantifications by contrasting individual and summed phospho-peptide intensities with the site-level values derived by MaxQuant. This will support the validation of quantification results, for example, for the selection of candidates for follow-up studies.
Pipart, J.; Holstein, T.; Muth, T.; Martens, L.
Show abstract
The recent years, with the global SARS-Cov-2 pandemic, have shown the importance of strain level identification of viral pathogens. While the gold-standard approach for unkown viral sample identification remains genomics, studies have shown the necessity and advantages of orthogonal experimental approaches such as proteomics, based on proteomic database search methods. The databases required as references for both proteins and genome sequences are known to be biased towards certain taxa, such as pathogenic strains or species, or common model organisms. Aditionally, the proteomic databases are not as comprehensive as the genomic databases. We present MultiStageSearch, an iterative database search approach for the taxonomic identification of viral samples combining proteomic and genomic databases. The potentially present species and strains are inferred using a generalist proteomic reference database. MultiStageSearch then automatically creates a proteogenomic database. This database is further pre-processed byfiltering for duplicates as well as clustering of identical ORFs to address potential bias present in the genomic database. Furthermore, the workflow is independent of the strain level NCBI taxonomy, enabling the inference of strains that are not present in the NCBI taxonomy. We performed a benchmark on several viral samples to demonstrate the performance of the strain level taxonomic inference. The benchmark shows superior performance compared to state of the art methods for untargeted strain level inference using proteomic data while being independent of the NCBI taxonomy at strain level.
Vande Moortele, T.; Devlaminck, B.; Van de Vyver, S.; Van Den Bossche, T.; Martens, L.; Dawyndt, P.; Mesuere, B.; Verschaffelt, P.
Show abstract
Unipept, a pioneering software tool in metaproteomics, has significantly advanced the analysis of complex ecosystems by facilitating both taxonomic and functional insights from environmental samples. From the onset, Unipepts capabilities focused on tryptic peptides, utilizing the predictability and consistency of trypsin digestion to efficiently construct a protein reference database. However, the evolving landscape of proteomics and emerging fields like immunopeptidomics necessitate a more versatile approach that extends beyond the analysis of tryptic peptides. In this article, we present a significant update to the underlying index structure of Unipept, which is now powered by a Sparse Suffix Array index. This advancement enables the analysis of semi-tryptic peptides, peptides with missed cleavages, and non-tryptic peptides such as those encountered in other research fields such as immunopeptidomics (e.g. MHC- and HLA-peptides). This new index benefits all tools in the Unipept ecosystem such as the web application, desktop tool, API and command line interface. A benchmark study highlights significantly improved performance in handling missed cleavages, preserving the same level of accuracy. For TOC Only O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=200 SRC="FIGDIR/small/615136v2_ufig1.gif" ALT="Figure 1"> View larger version (32K): org.highwire.dtl.DTLVardef@5b8fe2org.highwire.dtl.DTLVardef@1435321org.highwire.dtl.DTLVardef@106a568org.highwire.dtl.DTLVardef@15563e2_HPS_FORMAT_FIGEXP M_FIG C_FIG
Bordag, N.; Zuegner, E.; Lopez-Garcia, P.; Kofler, S.; Tomberger, M.; Al-Baghdadi, A.; Schweiger, J.; Erdem, Y.; Magnes, C.; Hidekazu, S.; Wadsak, W.; Erxleben, B.-T.; Prietl, B.
Show abstract
AO_SCPLOWBSTRACTC_SCPLOWPESI-MS enables with its greatly simplified handling and fast result delivery the application field for high-throughput use in routine settings. In health care and research, pre-analytical errors often remain undetected and disrupt diagnosis, treatment, clinical studies and biomarker validations incurring high costs. This proof-of-principle study investigates the suitability of PESI-MS for robust, routine sample quality evaluation. One of the most common pre-analytical quality issues in blood sampling are prolonged transportations times from bedside to laboratory promptly changing the metabolome. Here, human blood (n=50) was processed immediately or with a time delay of 3 h. The developed sample preparation method delivers ready-to-measure extracts in <8 min. PESI-MS spectra were measured in both ionization modes in 2 min from as little as 2 {micro}l plasma allowing 3 replicate measurements. The mass spectra contained 1200 stable features covering a broad chemical space covering major metabolic classes (e.g. fatty acids, lysolipids, lipids). The time delay of 3 h was predictable by using 18 features with AUC > 0.95 with various machine learning and was robust against loss of single features. Our results serve as first proof of principle for the unique advantages of PESI-MS in sample quality assessments. The results pave the way towards a fully automated, cost-efficient, user-friendly, robust and fast quality assessment of human blood samples from minimal sample amounts. GO_SCPLOWRAPHICALC_SCPLOWO_SCPCAP C_SCPCAPO_SCPLOWABSTRACTC_SCPLOW O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=38 SRC="FIGDIR/small/21254782v1_ufig1.gif" ALT="Figure 1"> View larger version (15K): org.highwire.dtl.DTLVardef@182233forg.highwire.dtl.DTLVardef@d801f9org.highwire.dtl.DTLVardef@169a2aaorg.highwire.dtl.DTLVardef@3673bb_HPS_FORMAT_FIGEXP M_FIG C_FIG
Christianson, K. E.; Jaffe, J. D.; Carr, S. A.; Vaca Jacome, A. S.
Show abstract
Data-independent acquisition (DIA) is a powerful mass spectrometry method that promises higher coverage, reproducibility, and throughput than traditional quantitative proteomics approaches. However, the complexity of DIA data caused by fragmentation of co-isolating peptides presents significant challenges for confident assignment of identity and quantity, information that is essential for deriving meaningful biological insight from the data. To overcome this problem, we previously developed Avant-garde, a tool for automated signal refinement of DIA and other targeted mass spectrometry data. AvG is designed to work alongside existing tools for peptide detection to address the reliability and quantitative suitability of signals extracted for the identified peptides. While its use is straightforward and offers efficient refinement for small datasets, the execution of AvG for large DIA datasets is time-consuming, especially if run with limited computational resources. To overcome these limitations, we present here an improved, cloud-based implementation of the AvG algorithm deployed on Terra, a user-friendly cloud-based platform for large-scale data analysis and sharing, as an accessible and standardized resource to the wider community.
Camacho, O. M.; Ramsbottom, K.; Collins, A.; Jones, A. R.
Show abstract
Phosphorylation is a post-translational modification of great interest to researchers due to its relevance in many biological processes. LC-MS/MS techniques have enabled high-throughput data acquisition with studies claiming identification and localisation of thousands of phosphosites. The identification and localisation of phosphosites emerge from different analytical pipelines and scoring algorithms, with uncertainty embedded throughout the pipeline. For many pipelines and algorithms, arbitrary thresholding is used, but little is known about the actual global false localisation rate in these studies. Recently, it has been suggested using decoy amino acids to estimate global false localisation rates of phosphosites, amongst the peptide-spectrum matches reported. We here describe a simple pipeline aiming to maximize the information extracted from these studies by objectively collapsing from peptide-spectrum match to peptidoform-site level, as well as combining findings from multiple studies while maintaining track of false localisation rates. We show that the approach is more effective than current processes that use a simpler mechanism for handling phosphosite identification redundancy within and across studies. In our case study using 8 rice phophoproteomics data sets, 6,368 unique sites were identified confidently identified using our decoy approach compared to 4,687 using traditional thresholding in which false localisation rates are unknown.
Bachman, J. A.; Gyori, B. M.; Sorger, P. K.
Show abstract
Protein phosphorylation regulates numerous cellular processes and is highly studied in biology.However, the analysis of phosphoproteomic datasets remains challenging due to limited information on upstream regulators of phosphosites, which is fragmented across multiple curated databases and unstructured literature. When aggregating information on phosphosites from six databases and three text mining systems, we found that a substantial proportion of phosphosites were mentioned at residue positions not matching the reference sequence. These errors were often attributable to the use of residue numbers from non-canonical protein isoforms, mouse or rat proteins, or post-translationally processed proteins. Non-canonical site numbering is also prevalent in mass spectrometry datasets from large-scale efforts such as the Clinical Proteomic Tumor Analysis Consortium (CPTAC). To address these issues, we developed ProtMapper, an open-source Python tool that automatically normalizes site positions to human protein reference sequences. We used ProtMapper coupled with the INDRA knowledge assembly system to create a corpus of 37,028 regulatory annotations for 16,332 sites - to our knowledge, the most comprehensive corpus of literature-derived information about phosphosite regulation currently available. This work highlights how automated phosphosite normalization coupled to text mining and knowledge assembly allows researchers to leverage phosphosite information that exists within the scientific literature.
Ramsbottom, K. A.; Prakash, A. A.; Perez-Riverol, Y.; Martin Camacho, O.; Martin, M.; Vizcaino, J. A.; Deutsch, E. W.; Jones, A. R.
Show abstract
Phosphoproteomics methods are commonly employed in labs to identify and quantify the sites of phosphorylation on proteins. In recent years, various software tools have been developed, incorporating scores or statistics related to whether a given phosphosite has been correctly identified, or to estimate the global false localisation rate (FLR) within a given data set for all sites reported. These scores have generally been calibrated using synthetic data sets, and their statistical reliability on real datasets is largely unknown. As a result, there is considerable problem in the field of reporting incorrectly localised phosphosites, due to inadequate statistical control. In this work, we develop the concept of using scoring and ranking modifications on a decoy amino acid, i.e. one that cannot be modified, to allow for independent estimation of global FLR. We test a variety of different amino acids to act as the decoy, on both synthetic and real data sets, demonstrating that the amino acid selection can make a substantial difference to the estimated global FLR. We conclude that while several different amino acids might be appropriate, the most reliable FLR results were achieved using alanine and leucine as decoys, although we have a preference for alanine due to the risk of potential confusion between leucine and isoleucine amino acids. We propose that the phosphoproteomics field should adopt the use of a decoy amino acid, so that there is better control of false reporting in the literature, and in public databases that re-distribute the data. Data are available via ProteomeXchange with identifier PXD028840.
Orsburn, B.
Show abstract
Epigenetic programming has been shown to play a role in nearly every human system and disease where anyone has thought to look. However, the levels of heterogeneity at which epigenetic or epiproteomic modifications occur at single cell resolution across a population remains elusive. While recent advances in sequencing technology have allowed between 1 and 3 histone post-translational modifications to be analyzed in each single cell, over twenty separate chemical PTMs are known to exist, allowing thousands of possible combinations. Single cell proteomics by mass spectrometry (SCP) is an emerging technology in which hundreds or thousands of proteins can be directly quantified in typical human cells. As the proteins detected and quantified by SCP are heavily biased toward proteins of highest abundance, chromatin proteins are an attractive target for analysis. To this end, I applied SCP to the analysis of cancer cells treated with mocetinostat, a class specific histone deacetylase inhibitor. I find that 16 PTMs can be confidently identified and localized with high site specificity in single cells. In addition, the high abundance of histone proteins allows higher throughput methods to be utilized for SCP than previously described. While quantitative accuracy suffers when analyzing more than 700 cells per day, 9 histone proteins can be measured in single cells analyzed at even 3,500 cells per day, a throughput 10-fold greater than any previous report. In addition, the unbiased global approach utilized herein identifies a previously uncharacterized response to this drug through the S100-A8/S100-A9 protein complex partners. This response is observed in nearly every cell of the over 1,000 analyzed in this study, regardless of the relative throughput of the method utilized. While limitations exist in the methods described herein, current technologies can easily improve upon the results presented here to allow comprehensive analysis of histone PTMs to be performed in any mass spectrometry lab. All raw and processed data described in this study has been made publicly available through the ProteomeXchange/MASSIVE repository system as MSV000093434 Abstract graphic O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=148 SRC="FIGDIR/small/574437v2_ufig1.gif" ALT="Figure 1"> View larger version (47K): org.highwire.dtl.DTLVardef@89811corg.highwire.dtl.DTLVardef@17b7971org.highwire.dtl.DTLVardef@16221deorg.highwire.dtl.DTLVardef@19d9968_HPS_FORMAT_FIGEXP M_FIG C_FIG
Ramsbottom, K. A.; Prakash, A. A.; Perez-Riverol, Y.; Camacho, O. M.; Sun, Z.; Kundu, D.; Bowler-Barnett, E.; Martin, M.; Fan, J.; Chebotarov, D.; McNally, K.; Deutsch, E. W.; Vizcaino, J. A.; Jones, A. R.
Show abstract
Phosphorylation is the most studied post-translational modification, and has multiple biological functions. In this study, we have re-analysed publicly available mass spectrometry proteomics datasets enriched for phosphopeptides from Asian rice (Oryza sativa). In total we identified 15,522 phosphosites on serine, threonine and tyrosine residues on rice proteins. We identified sequence motifs for phosphosites, and link motifs to enrichment of different biological processes, indicating different downstream regulation likely caused by different kinase groups. We cross-referenced phosphosites against the rice 3,000 genomes, to identify single amino acid variations (SAAVs) within or proximal to phosphosites that could cause loss of a site in a given rice variety. The data was clustered to identify groups of sites with similar patterns across rice family groups, for example those highly conserved in Japonica, but mostly absent in Aus type rice varieties - known to have different responses to drought. These resources can assist rice researchers to discover alleles with significantly different functional effects across rice varieties. The data has been loaded into UniProt Knowledge-Base - enabling researchers to visualise sites alongside other data on rice proteins e.g. structural models from AlphaFold2, PeptideAtlas and the PRIDE database - enabling visualisation of source evidence, including scores and supporting mass spectra.
Wilding-McBride, D.; Infusini, G.; Webb, A. I.
Show abstract
1The determination of relative protein abundance in label-free data dependant acquisition (DDA) LC-MS/MS proteomics experiments is hindered by the stochastic nature of peptide detection and identification. Peptides with an abundance near the limit of detection are particularly effected. The possible causes of missing values are numerous, including; sample preparation, variation in sample composition and the corresponding matrix effects, instrument and analysis software settings, instrument and LC variability, and the tolerances used for database searching. There have been many approaches proposed to computationally address the missing values problem, predominantly based on transferring identifications from one run to another by data realignment, as in MaxQuants matching between runs (MBR) method, and/or statistical imputation. Imputation transfers identifications by statistical estimation of the likelihood the peptide is present based on its presence in other technical replicates but without probing the raw data for evidence. Here we present a targeted extraction approach to resolving missing values without modifying or realigning the raw data. Our method, which forms part of an end-to-end timsTOF processing pipeline we developed called Targeted Feature Detection and Extraction (TFD/E), predicts the coordinates of peptides using machine learning models that learn the delta of each peptides coordinates from a reference library. The models learn the variability of a peptides location in 3D space from the variability of known peptide locations around it. Rather than realigning or altering the raw data, we create a run-specific lens through which to observe the data, targeting a location for each peptide of interest and extracting it. By also creating a method for extracting decoys, we can estimate the false discovery rate (FDR). Our method outperforms MaxQuant and MSFragger by achieving substantially fewer missing values across an experiment of technical replicates. The software has been developed in Python using Numpy and Pandas and open sourced with an MIT license (DOI 10.5281/zenodo.6513126) to provide the opportunity for further improvement and experimentation by the community. Data are available via ProteomeXchange with identifier PXD030706. 2 Author SummaryMissed identifications of peptides in data-dependent acquisition (DDA) proteomics experiments are an obstacle to the precise determination of which proteins are present in a sample and their relative abundance. Efforts to address the problem in popular analysis workflows include realigning the raw data to transfer a peptide identification from one run to another. Another approach is statistically analysing peptide identifications across an experiment to impute peptide identifications in runs in which they were missing. We propose a targeted extraction technique that uses machine learning models to construct a run-specific lens through which to examine the raw data and predict the coordinates of a peptide in a run. The models are trained on differences between observations of confidently identified peptides in a run and a reference library of peptide observations collated from multiple experiments. To minimise the risk of drawing unsound experimental conclusions based on an unknown rate of false discoveries, our method provides a mechanism for estimating the false discovery rate (FDR) based on the misclassification of decoys as target features. Our approach outperforms the popular analysis tool suites MaxQuant and MSFragger/IonQuant, and we believe it will be a valuable contribution to the proteomics toolbox for protein quantification.
Hinkle, T. B.; Bakalarski, C. E.
Show abstract
Selection and application of protein inference algorithms can have a significant impact on the data output from tandem mass spectrometry (MS/MS) experiments, yet its use is often an afterthought in proteomics research due to the inability to apply different inference algorithms in existing analysis systems today. PyProteinInference provides a comprehensive suite of tools to guide researchers through the application of multiple inference algorithms and computation of protein-level, set-based false discovery rates (FDR) from tandem mass spectrometry (MS/MS) data using a unified interface. Here, we describe the software and its application to a K562 whole-cell lysate as well as in a CRAF affinity-purification mass spectrometry experiment to demonstrate its utility in facilitating conclusions about underlying biological mechanisms in proteomic data.
Jones, J.
Show abstract
Mass spectrometry methods of peptide identification involve comparing observed tandem spectra with in-silico derived spectrum models. Presented here is a proteomics search engine that offers a new variation of the standard approach, with improved results. The proposed method employs information theory and probabilistic information retrieval on a pre-computed and indexed fragmentation database generating a peptide-to-spectrum match (PSM) score modeled on fragment ion frequency. As a result, the direct application of modern document mining, allows for treating the collection of peptides as a corpus and corresponding fragment ions as indexable words, leveraging ready-built search engines and common predefined ranking algorithms. Fast and accurate PSM matches are achieved yielding a 5-10% higher rate of peptide identities than current database mining methods. Immediate applications of this search engine are aimed at identifying peptides from large sequence databases consisting of homologous proteins with minor sequence variations, such as genetic variation expected in the human population.Competing Interest StatementThis research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors. Researchers have commercial interests in applications of the technology described herein.View Full Text
Prensner, J. R.; Abelin, J. G.; Kok, L. W.; Clauser, K. R.; Mudge, J. M.; Ruiz-Orera, J.; Bassani-Sternberg, M.; Deutsch, E. W.; van Heesch, S.
Show abstract
Ribosome profiling (Ribo-seq) has proven transformative for our understanding of the human genome and proteome by illuminating thousands of non-canonical sites of ribosome translation outside of the currently annotated coding sequences (CDSs). A conservative estimate suggests that at least 7,000 non-canonical open reading frames (ORFs) are translated, which, at first glance, has the potential to expand the number of human protein-coding sequences by 30%, from [~]19,500 annotated CDSs to over 26,000. Yet, additional scrutiny of these ORFs has raised numerous questions about what fraction of them truly produce a protein product and what fraction of those can be understood as proteins according to conventional understanding of the term. Adding further complication is the fact that published estimates of non-canonical ORFs vary widely by around 30-fold, from several thousand to several hundred thousand. The summation of this research has left the genomics and proteomics communities both excited by the prospect of new coding regions in the human genome, but searching for guidance on how to proceed. Here, we discuss the current state of non-canonical ORF research, databases, and interpretation, focusing on how to assess whether a given ORF can be said to be "protein-coding". In briefThe human genome encodes thousands of non-canonical open reading frames (ORFs) in addition to protein-coding genes. As a nascent field, many questions remain regarding non-canonical ORFs. How many exist? Do they encode proteins? What level of evidence is needed for their verification? Central to these debates has been the advent of ribosome profiling (Ribo-seq) as a method to discern genome-wide ribosome occupancy, and immunopeptidomics as a method to detect peptides that are processed and presented by MHC molecules and not observed in traditional proteomics experiments. This article provides a synthesis of the current state of non-canonical ORF research and proposes standards for their future investigation and reporting. HighlightsO_LICombined use of Ribo-seq and proteomics-based methods enables optimal confidence in detecting non-canonical ORFs and their protein products. C_LIO_LIRibo-seq can provide more sensitive detection of non-canonical ORFs, but data quality and analytical pipelines will impact results. C_LIO_LINon-canonical ORF catalogs are diverse and span both high-stringency and low-stringency ORF nominations. C_LIO_LIA framework for standardized non-canonical ORF evidence will advance the research field. C_LI Graphical Abstract O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=107 SRC="FIGDIR/small/541049v1_ufig1.gif" ALT="Figure 1"> View larger version (28K): org.highwire.dtl.DTLVardef@4878d0org.highwire.dtl.DTLVardef@1f3c9e4org.highwire.dtl.DTLVardef@4ba1f7org.highwire.dtl.DTLVardef@17695a6_HPS_FORMAT_FIGEXP M_FIG C_FIG
Palmblad, M.
Show abstract
In a recent Journal of Proteome Research paper, I described some general properties and constraints of a hypothetical next generation of proteomics technology based on single-molecule peptide sequencing. This work prompted many interesting questions, both from the reviewers of the initial manuscript and later from readers and colleagues. This follow-up paper addresses some of questions by clarifying the original results, considering alternative metrics, and a number of new simulations. Specifically, the discriminative power of individual amino acids is revisited, simulating additional proteolytic agents. These simulations show allowing missed cleavages generally increases the discriminative power of the amino acids in the proteolytic motif. Additional simulations show the effect of non-ideal conditions modelled on the number of proteins lacking proteotypic reads is very small, and that the average number of proteotypic reads per protein follow the same rule on the performance of the optimal choice of labeled amino acids as the number of distinguishable proteins in NeXtProt. The goal of this paper is to expand prior results and continue the scientific discussion on the possibilities of future proteomics technologies.
Mosedale, D. E.; Sharp, T.; de Graff, A.; Grainger, D.
Show abstract
Type 2 diabetes mellitus (T2DM) is a rapidly increasing threat to global health, which brings with it a demand for better treatments. This study aimed to identify differences in the proteome of patients with T2DM to identify new targets for therapeutic intervention. We used a highly reproducible bottom-up proteomics protocol to investigate differences in protein, peptide and post-translational modifications between subjects with T2DM and matched controls in an untargeted manner. The serum proteome was remarkably similar at the protein level with no differences between the subject groups across 175 proteins and five orders of magnitude. Strong associations were found, however, between fasting serum glucose levels and glycations of abundant serum proteins, including sites on apolipoprotein A1, apolipoprotein A2 and 2- macroglobulin. We also investigated proteome differences associated with BMI, and found all three components of the ternary complex (IGF-binding protein complex acid-labile subunit (ALS), IGF-binding protein 3 (IGFBP-3) and IGF-2) were strongly negatively associated with BMI. The results show the power of a proteomics protocol optimised for precision rather than depth of coverage, which here has identified strong correlations between physiological measurements and very low abundance post-translational modifications. In T2DM any differences in the serum proteome are very small, and likely a consequence rather than a cause of hyperglycaemia. Article highlightsO_LIOur goal was to use high-precision label-free bottom-up LC-MS/MS proteomics to investigate differences in the proteome of patients with T2DM and controls, and potentially identify novel targets for future research. C_LIO_LIThe serum proteome is remarkably similar in patients with T2DM and controls, with the only major difference being glycations of abundant serum proteins C_LIO_LIAll three components of the ternary complex (comprised of ALS, IGFBP-3 and IGF-2) were strongly negatively associated with BMI. C_LIO_LIThe results highlight the power of a proteomics study designed with three key features at its core: a proteomics protocol optimised for precision rather than depth of coverage; an open bioinformatics approach investigating proteins, peptides and PTMs without prior assumptions about which features are important; and analysis of individual subject samples so that results take into account person-to-person variability C_LI
Sun, S.; Zheng, Z.; Wang, J.; Li, F.; He, A.; Tan, C. S. H.
Show abstract
Vast majority of cellular activities are carried out by protein complexes that assembled dynamically in response to cellular needs and environmental cues. Large scale efforts had uncovered a large repertoire of functionally uncharacterized protein complexes which necessitate new strategies to delineate their roles in various cellular activities and diseases. Thermal proximity co-aggregation profiling could be readily deployed to simultaneously characterize the dynamics for hundreds to thousands of protein complexes in situ across different cellular conditions. Toward this goal, we had optimized the original method both experimentally and computationally. In this new iteration termed Slim-TPCA, fewer temperatures are used which increase throughputs by over 3X, while coupled with new scoring metrics and statistical evaluation resulted in minimal compromise in coverage and the detection of more relevant protein complexes. Overall, less samples are needed, false positives from batch effects are minimized and statistical evaluation time is reduced by two orders of magnitude. We applied Slim-TPCA to profile state of protein complexes in K562 cells under different duration of glucose deprivation. More protein complexes are found dissociated based on TPCA signature in accordance with expected downregulation of most cellular activities. These complexes include 55S ribosome and various respiratory complexes in mitochondria revealing the utility of TPCA to study protein complexes in organelles. On other hand, protein complexes involved in protein transport and degradation are found increasingly associated revealing their involvement in metabolic reprogramming during glucose deprivation. In summary. Slim-TPCA is an efficient strategy for proteome-wide characterization of protein complexes. The various algorithmic improvement of Slim-TPCA is available as Python package at https://pypi.org/project/Slim-TPCA/
Spick, M.; Isherwood, C. M.; Gethings, L.; Hassanin, H.; van der Veen, D. R.; Skene, D. J.; Johnston, J. D.
Show abstract
Time-of-day variation in the molecular profile of biofluids and tissues is a well-described phenomenon, but - especially for proteomics - is rarely considered in terms of the challenges this presents to reproducible biomarker identification. In this work we demonstrate these confounding issues using a small-scale proteomics analysis of male participants in a constant routine protocol following an 8-day laboratory study, in which sleep-wake, light-dark and meal timings were controlled. We provide a case study analysis of circadian and ultradian rhythmicity in proteins in the complement and coagulation cascades, as well as apolipoproteins, and demonstrate that rhythmicity increases the risk of Type II errors due to the reduction in statistical power from increased variance. For the proteins analysed herein we show that to maintain statistical power if chronobiological variation is not controlled for, n should be increased (by between 9% and 20%); failure to do so would increase {beta}, the chance of Type II error, from a baseline value of 20% to between 22% and 28%. Conversely, controlling for rhythmic time-of-day variation in study design offers the opportunity to improve statistical power and reduce the chances of Type II errors. Indeed, control of time-of-day sampling is a more cost-effective strategy than increasing sample sizes. We recommend that best practice in proteomics study design should account for temporal variation as part of sampling strategy where possible. Where this is impractical, we recommend that additional variance from chronobiological effects be considered in power calculations, that time of sampling be reported as part of study metadata, and that researchers reference any previously identified rhythmicity in biomarkers and pathways of interest. These measures would mitigate against both false and missed discoveries, and improve reproducibility, especially in studies looking at biomarkers, pathways or conditions with a known chronobiological component.